Improved Topic Modeling in Twitter Through Community Pooling
نویسندگان
چکیده
Social networks play a fundamental role in propagation of information and news. Characterizing the content messages becomes vital for different tasks, like breaking news detection, personalized message recommendation, fake users flow characterization others. However, Twitter posts are short often less coherent than other text documents, which makes it challenging to apply mining algorithms these datasets efficiently. Tweet-pooling (aggregating tweets into longer documents) has been shown improve automatic topic decomposition, but performance achieved this task varies depending on pooling method.In paper, we propose new scheme modelling Twitter, groups whose authors belong same community (group who mainly interact with each not groups) user interaction graph. We present complete evaluation methodology, state art schemes previous models terms cluster quality, document retrieval tasks supervised machine learning classification score. Results show that our Community polling method outperformed methods majority metrics two heterogeneous datasets, while also reducing running time. This is useful when dealing big amounts noisy user-generated social media texts. Overall, findings contribute an improved methodology identifying latent topics dataset, without need modifying basic machinery decomposition model.
منابع مشابه
Twitter Topic Modeling by Tweet Aggregation
Conventional topic modeling schemes, such as Latent Dirichlet Allocation, are known to perform inadequately when applied to tweets, due to the sparsity of short documents. To alleviate these disadvantages, we apply several pooling techniques, aggregating similar tweets into individual documents, and specifically study the aggregation of tweets sharing authors or hashtags. The results show that ...
متن کاملCharacterizing Twitter Discussions About HPV Vaccines Using Topic Modeling and Community Detection
BACKGROUND In public health surveillance, measuring how information enters and spreads through online communities may help us understand geographical variation in decision making associated with poor health outcomes. OBJECTIVE Our aim was to evaluate the use of community structure and topic modeling methods as a process for characterizing the clustering of opinions about human papillomavirus ...
متن کاملTopic Modeling in Twitter: Aggregating Tweets by Conversations
We propose a new pooling technique for topic modeling in Twitter, which groups together tweets occurring in the same user-to-user conversation. Under this scheme, tweets and their replies are aggregated into a single document and the users who posted them are considered co-authors. To compare this new scheme against existing ones, we train topic models using Latent Dirichlet Allocation (LDA) an...
متن کاملOnline Topic Modeling for Real-Time Twitter Search
This paper discusses the work done by a team at the University of Florida for the TREC 2011 Microblog Track. To build a real-time microblog search engine we rely on topic modeling for our search. To facilicate our algorithms we bundle similar tweets together in what we call supertweet generation. We perform online inference and offline inference depending on the time frame of the topical query....
متن کاملAssignment 2: Twitter Topic Modeling with Latent Dirichlet Allocation Background
In this assignment we are going to implement a parallel MapReduce version of a popular topic modeling algorithm called Latent Dirchlet Allocation (LDA). Because it allows for exploring vast document collection, we are going to use this algorithm to see if we can automatically identify topics from a series of Tweets. For the purpose of this assignment, we are going to treat every tweet as a docu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-86692-1_17